Data visualization is an essential tool that is used to communicate ideas effectively (Antony Unwin, 2020). When dealing with a large data set, more often than not, main ideas are better encapsulated in a useful and understandable visualization (Mathieu Stark, 2020).
There are plenty of visualization packages out there for R. The top R data visualizations in 2020 includes plotly (Harkiran Kaur, 2020), and so, for this vignette, I will look at the package plotly to showcase a few ways to visualize interactive web plots. It should be noted that there are a variety of functions within plotly (which can be found here should you wish to explore them).
If the packages mentioned below are already installed, then there is no need to go through this step, so feel free to skip this portion. The packages below have been deemed as necessary to run through this R Vignette.
# if you require the below packages to be installed, remove the '#' at the front
# install.packages("tidyverse")
# install.packages("plotly")
# install.packages("magrittr")
It is best practice to ensure that a package has been installed prior to loading so that you do not incur any errors.
This section will show you how to load the packages required to run through this R Vignette. To load the packages, use the function library().
library(plotly)
library(tidyverse)
library(magrittr)
Note: The package tidyverse was loaded as this package essentially contains a wide variety of r packages one would use for analytical purposes (including, but not limited to, data manipulation and tidy data) (Wickham et al., 2019).
Note: The package magrittr was also loaded as this package allows us to utilise pipes and other operators useful when coding in R (Stefan Milton Bache, 2014).
As a ramen enthusiast, I have picked the CSV file from this kaggle dataset. I downloaded the CSV file and saved it in a folder that I called “data.”
In order for us to have a closer look at the plotly package, we would first need to load some data to analyse. To load the ramen data I downloaded (Aleksey Bilogur, 2018), I used the read_csv function from the package tidyverse, and called it ramen.
ramen <- read_csv("data/ramen-ratings.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## `Review #` = col_double(),
## Brand = col_character(),
## Variety = col_character(),
## Style = col_character(),
## Country = col_character(),
## Stars = col_character(),
## `Top Ten` = col_character()
## )
head(ramen, n = 5) # shows the first five rows of each column in this tibble
## # A tibble: 5 × 7
## `Review #` Brand Variety Style Country Stars `Top Ten`
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2580 New Touch T's Restaurant Tantan… Cup Japan 3.75 <NA>
## 2 2579 Just Way Noodles Spicy Hot Ses… Pack Taiwan 1 <NA>
## 3 2578 Nissin Cup Noodles Chicken V… Cup USA 2.25 <NA>
## 4 2577 Wei Lih GGE Ramen Snack Tomat… Pack Taiwan 2.75 <NA>
## 5 2576 Ching's Secret Singapore Curry Pack India 3.75 <NA>
Before we can use this data for any type of plotting, some preparations must be adhered to. One way to prep your data is to have it adhere to the tidy data way.
"In tidy data:
Every column is a variable.
Every row is an observation.
Every cell is a single value."
For the plotly plot I have in mind, I require Stars to be numeric and, based on the head() function performed in Section 1, I will only need some of the columns. I will also opt to remove any row with NA.
ramen$Stars <- as.numeric(ramen$Stars)
ramen_sub <- select(ramen, c(Brand,Style,Country,Stars)) %>%
drop_na()
A few more simple checks to see if the data is ready for plotting:
head(ramen_sub) # shows the first few rows of each column in this tibble
## # A tibble: 6 × 4
## Brand Style Country Stars
## <chr> <chr> <chr> <dbl>
## 1 New Touch Cup Japan 3.75
## 2 Just Way Pack Taiwan 1
## 3 Nissin Cup USA 2.25
## 4 Wei Lih Pack Taiwan 2.75
## 5 Ching's Secret Pack India 3.75
## 6 Samyang Foods Pack South Korea 4.75
str(ramen_sub) # shows the internal structure of this tibble
## tibble [2,575 × 4] (S3: tbl_df/tbl/data.frame)
## $ Brand : chr [1:2575] "New Touch" "Just Way" "Nissin" "Wei Lih" ...
## $ Style : chr [1:2575] "Cup" "Pack" "Cup" "Pack" ...
## $ Country: chr [1:2575] "Japan" "Taiwan" "USA" "Taiwan" ...
## $ Stars : num [1:2575] 3.75 1 2.25 2.75 3.75 4.75 4 3.75 0.25 2.5 ...
summary(ramen_sub) # shows some descriptive statistics of your tibble
## Brand Style Country Stars
## Length:2575 Length:2575 Length:2575 Min. :0.000
## Class :character Class :character Class :character 1st Qu.:3.250
## Mode :character Mode :character Mode :character Median :3.750
## Mean :3.655
## 3rd Qu.:4.250
## Max. :5.000
For the sake of this example, I will use the mean of Stars to 2 decimal points.
final_ramen <- ramen_sub %>%
group_by(Country,Style) %>%
summarise(Stars = round(mean(Stars),digits = 2))
For this example, I have chosen a bubble plot to visualize my ramen data. (A bubble plot is essentially a scatter plot where the size and colour of the “bubble” can be manipulated)
The fun element about plotly visuals is the ability to interact with them (Plotly r Open Source Graphing Library). Try hovering over the plot or zoom in/out.
fig1 <- plot_ly(final_ramen, x = ~Country, y = ~Style, color = ~Stars,
size = ~Stars, text = ~Stars, type = 'scatter', mode = 'markers',
hovertemplate = paste(
'Stars: %{text}',
'<br>%{x}<extra></extra>'
),
marker = list(
opacity = 0.5, colors = 'Viridis'
)
)
fig1 <- fig1 %>% layout(title = 'Ramen Ratings by Country of Origin and Style',
xaxis = list(showgrid = FALSE),
yaxis = list(showgrid = FALSE)
)
fig1
type = 'scatter' and mode = 'markers' are crucial portions of the code to provide that “bubble” look.marker = list(*) portion of code is customizable depending on how you would like your “bubble” to look.showgrid = FALSE in your code makes it so that the grid will not show in your plot.One of my favourite plotly plots is one called Choropleth (a map-based interactive plot).
fig2 <- plot_ly(final_ramen, # data to plot
type = 'choropleth', # specifies a choropleth plot
locations = ~Country, # column that contains locations
locationmode = 'country names', # locations label type
z = ~Stars, zmin = 1, zmax = 5, # this is what will be featured in the colorbar
colorscale = 'Viridis', # the color I chose for the colorbar
marker = list(
line = list(
color = "grey", width = 0.5 # line details separating the locations
)
)
)
fig2 <- fig2 %>% layout(title = 'Avg Ramen Ratings by Country of Origin'
)
fig2 <- fig2 %>% colorbar(title = "Avg Star Rating",
position = "bottomright"
)
fig2
The coding for this has a slightly steep learning curve and more examples can be found here.
As you can see, interactive web graphs provide a fun and informative way to visualize your data. What you see on this vignette is but a sampling of what’s on offer.